cluster ward | Insufficient memory for ClusterMatrix

Matthew Campbell

Join Date: Mar 2022
Posts: 8

cluster ward | Insufficient memory for ClusterMatrix | r(950)

10 Mar 2022, 07:32

Dear Statalist,
I have a sample of about 38,000 observations and 9 variables. I want to perform a Ward's linkage cluster analysis. However, whenever I try to execute the "cluster ward" command in Stata, I get the following message:

insufficient memory for ClusterMatrix
r(950);

I have also tried to run the analysis from a server with more than 128 GB of RAM. But I always get the same error message.

How can I solve this issue in your opinion? Below you can find more information about my problem.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(var1 var2 var3 var4 var5 var6 var7 var8) byte gender
       0        0 0   0   1        0        0   1 1
.3333333 .6666667 0   0   0 .3333333 .3333333   1 0
       0       .5 0  .5   0        0        1   1 1
       0        1 0   0   0        0        0   1 1
       0        1 0   0   0        0 .3333333   1 1
     .25      .75 0   0   0      .25      .25   1 0
      .5       .5 0   0   0       .5        1   1 1
       0       .5 0  .5   0       .5       .5   1 0
     .25       .5 0   0 .25      .25      .25   1 1
       0        0 0   1   0        1        0   1 1
       0        1 0   0   0        0        0   0 1
      .2       .6 0  .2   0       .2       .2   1 1
      .5       .5 0   0   0       .5       .5   1 1
       1        0 0   0   0        1        1   1 0
       0       .5 0  .5   0        0        0   1 1
       1        0 0   0   0        1        1   1 0
       0      .75 0 .25   0        0      .25 .75 1
.3333333 .6666667 0   0   0 .6666667 .3333333   1 1
       0        0 1   0   0        0        0   1 0
       0        1 0   0   0        0        1   1 1
end
label values gender gender
label def gender 0 "Men", modify
label def gender 1 "Women", modify

*>> Cluster analysis (Ward method)
cluster ward     ///
var1            ///
var2            ///
var3            ///
var4            ///
var5            ///
var6            ///
var7            ///
var8            ///
if gender==1, name(my_cluster_women)

Last edited by Matthew Campbell; 10 Mar 2022, 07:43.

Tags: cluster, ClusterMatrix, memory, ward

Nick Cox

Join Date: Mar 2014

Posts: 36060
#2

10 Mar 2022, 07:55

That is a big ask for a problem that entails pairwise comparisons..

I would look for clusters in graphs of leading principal components on var?. Alternatively, it could be that you have many duplicates and can slim down the dataset to one with distinct observations. Classification isn''t affected by the presence of duplicates. as I understand it.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

10 Mar 2022, 08:01

The full PDF documentation for the cluster linkage command tells us

Technical note

cluster commands require a significant amount of memory and execution time. With many observations, the execution time may be significant.

38,000 observations qualifies as "many", probably even after you reduce that to those with gender==1.

And the PDF documentation for the overview of the cluster analysis commands tells us

The first step of an agglomerative algorithm considers N(N-1)/2 possible fusions of observations
to find the closest pair. This number grows quadratically with N.

I'm of the opinion that hierarchical cluster analysis is out of the question for a problem of the size you present.
Comment
Matthew Campbell

Join Date: Mar 2022

Posts: 8
#4

20 Apr 2022, 07:10

Dear Nick Cox and William Lisowski,
thanks a lot for your answers. From what I understood, it was better to leave the hierarchical cluster analysis out. For this reason I opted for partitioning methods (more precisely k-means) as they were less computationally demanding...
Comment

Announcement

cluster ward | Insufficient memory for ClusterMatrix | r(950)

Comment

Comment

Comment